Search CORE

54 research outputs found

Low-rank and Sparse Soft Targets to Learn Better DNN Acoustic Models

Author: Asaei Afsaneh
Bourlard Herve
Dighe Pranay
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 18/10/2016
Field of study

Conventional deep neural networks (DNN) for speech acoustic modeling rely on Gaussian mixture models (GMM) and hidden Markov model (HMM) to obtain binary class labels as the targets for DNN training. Subword classes in speech recognition systems correspond to context-dependent tied states or senones. The present work addresses some limitations of GMM-HMM senone alignments for DNN training. We hypothesize that the senone probabilities obtained from a DNN trained with binary labels can provide more accurate targets to learn better acoustic models. However, DNN outputs bear inaccuracies which are exhibited as high dimensional unstructured noise, whereas the informative components are structured and low-dimensional. We exploit principle component analysis (PCA) and sparse coding to characterize the senone subspaces. Enhanced probabilities obtained from low-rank and sparse reconstructions are used as soft-targets for DNN acoustic modeling, that also enables training with untranscribed data. Experiments conducted on AMI corpus shows 4.6% relative reduction in word error rate

arXiv.org e-Print Archive

Crossref

Exploiting Low-dimensional Structures to Enhance DNN Based Acoustic Modeling in Speech Recognition

Author: Asaei Afsaneh
Bourlard Herve
Dighe Pranay
Luyet Gil
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 22/01/2016
Field of study

We propose to model the acoustic space of deep neural network (DNN) class-conditional posterior probabilities as a union of low-dimensional subspaces. To that end, the training posteriors are used for dictionary learning and sparse coding. Sparse representation of the test posteriors using this dictionary enables projection to the space of training data. Relying on the fact that the intrinsic dimensions of the posterior subspaces are indeed very small and the matrix of all posteriors belonging to a class has a very low rank, we demonstrate how low-dimensional structures enable further enhancement of the posteriors and rectify the spurious errors due to mismatch conditions. The enhanced acoustic modeling method leads to improvements in continuous speech recognition task using hybrid DNN-HMM (hidden Markov model) framework in both clean and noisy conditions, where upto 15.4% relative reduction in word error rate (WER) is achieved

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Crossref

Ad Hoc Microphone Array Calibration: Euclidean Distance Matrix Completion Algorithm and Theoretical Guarantees

Author: Asaei Afsaneh
Bourlard Herve
Garner Philip N.
Parhizkar Reza
Taghizadeh Mohammad J.
Publication venue
Publication date: 07/03/2014
Field of study

This paper addresses the problem of ad hoc microphone array calibration where only partial information about the distances between microphones is available. We construct a matrix consisting of the pairwise distances and propose to estimate the missing entries based on a novel Euclidean distance matrix completion algorithm by alternative low-rank matrix completion and projection onto the Euclidean distance space. This approach confines the recovered matrix to the EDM cone at each iteration of the matrix completion algorithm. The theoretical guarantees of the calibration performance are obtained considering the random and locally structured missing entries as well as the measurement noise on the known distances. This study elucidates the links between the calibration error and the number of microphones along with the noise level and the ratio of missing distances. Thorough experiments on real data recordings and simulated setups are conducted to demonstrate these theoretical insights. A significant improvement is achieved by the proposed Euclidean distance matrix completion algorithm over the state-of-the-art techniques for ad hoc microphone array calibration.Comment: In Press, available online, August 1, 2014. http://www.sciencedirect.com/science/article/pii/S0165168414003508, Signal Processing, 201

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Interpretation of Multiparty Meetings: The AMI and AMIDA Projects

Author: Bourlard Herve
Hain Thomas
Renals Steve
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2008
Field of study

The AMI and AMIDA projects are collaborative EU projects concerned with the automatic recognition and interpretation of multiparty meetings. This paper provides an overview of the advances we have made in these projects with a particular focus on the multimodal recording infrastructure, the publicly available AMI corpus of annotated meeting recordings, and the speech recognition framework that we have developed for this domain

CiteSeerX

Edinburgh Research Archive

Edinburgh Research Explorer

Posterior features applied to speech recognition tasks with user-defined vocabulary

Author: Guillermo Aradilla
Herve Bourlard
Mathew Magimai-Doss
Publication venue: IEEE
Publication date: 01/01/1960
Field of study

Thailand - Bronze-workersColorVolume 12, Page

Crossref

University of Wisconsin, Milwaukee: UWM Libraries Digital Collections

Keyword Detection for Spontaneous Speech

Author: Billard Aude
Bourlard Herve
Li Weifeng
Publication venue
Publication date: 02/07/2009
Field of study

This paper presents a system for keyword detection in spontaneous speech. Keywords are predefined through a set of acoustic examples provided by the users. Keyword detection proceeds in two steps: keyword searching and verification. To address the problem of using the same phoneme models in both keyword and filter models, we propose to remove the phoneme models included in the keyword model from the filter models. In order to reduce the false alarms caused by keyword searching step, dynamic time warping (DTW) based template matching and Gaussian Mixture Models (GMM) are proposed. Our keyword detection experiments demonstrate the effectiveness of the proposed methods by yielding improved detection performance compared to the baseline system

Infoscience - École polytechnique fédérale de Lausanne

Crossref

Connectionist probability estimators in HMM speech recognition

Author: Bourlard Herve
Cohen Michael
Franco Horacio
Morgan Nelson
Renals Steve
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/1994
Field of study

The authors are concerned with integrating connectionist networks into a hidden Markov model (HMM) speech recognition system. This is achieved through a statistical interpretation of connectionist networks as probability estimators. They review the basis of HMM speech recognition and point out the possible benefits of incorporating connectionist networks. Issues necessary to the construction of a connectionist HMM recognition system are discussed, including choice of connectionist probability estimator. They describe the performance of such a system using a multilayer perceptron probability estimator evaluated on the speaker-independent DARPA Resource Management database. In conclusion, they show that a connectionist component improves a state-of-the-art HMM system

Edinburgh Research Archive